深度神经网络在预测质量方面表现出巨大的成功,而可靠且稳健的不确定性估计仍然是一个挑战。预测性不确定性补充模型预测,并实现了下游任务的改进功能,包括嵌入式和移动应用,例如虚拟现实,增强现实,传感器融合和感知。这些应用程序通常需要复杂性的妥协,以获得由于内存非常有限和计算资源而导致的不确定性估计。我们通过使用Axolotl框架构建Monte Carlo辍学(MCDO)模型来解决这个问题;具体而言,我们多样化采样的子网,利用辍学模式,并使用分支技术来提高预测性能,同时保持快速计算。我们在使用CIFAR10 DataSet上进行(1)多级分类任务的实验,(2)更复杂的人体分段任务。我们的结果表明我们的方法通过接近深度集成预测质量和不确定性估算来达到效果,同时仍在实现资源限制的移动平台的推断。
translated by 谷歌翻译
已知现代卷积神经网络(CNNS)在校准上对看不见的输入数据的校准方面是过度自由度。也就是说,它们比他们准确更自信。如果预测的概率用于下游决策,则这是不希望的。在考虑精度时,CNN也令人惊讶地对压缩技术(例如量化)令人惊讶地稳健,这旨在降低计算和内存成本。我们表明,这种稳健性可以通过现代CNN的校准行为来部分解释,并且可以通过过度步骤来改进。这是由于直观的结果:低置信度预测更容易改变后量化,而不太准确。高信任预测将更加准确,但更难以改变。因此,产生后量化精度的最小降低。这提出了神经网络设计中的潜在冲突:过度频率的校准可能导致量化更好的鲁棒性。我们在CIFAR-100和ImageNet数据集上执行将训练后量化的实验应用于各种CNN。
translated by 谷歌翻译
Landing an unmanned aerial vehicle unmanned aerial vehicle (UAV) on top of an unmanned surface vehicle (USV) in harsh open waters is a challenging problem, owing to forces that can damage the UAV due to a severe roll and/or pitch angle of the USV during touchdown. To tackle this, we propose a novel model predictive control (MPC) approach enabling a UAV to land autonomously on a USV in these harsh conditions. The MPC employs a novel objective function and an online decomposition of the oscillatory motion of the vessel to predict, attempt, and accomplish the landing during near-zero tilt of the landing platform. The nonlinear prediction of the motion of the vessel is performed using visual data from an onboard camera. Therefore, the system does not require any communication with the USV or a control station. The proposed method was analyzed in numerous robotics simulations in harsh and extreme conditions and further validated in various real-world scenarios.
translated by 谷歌翻译
Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings. In most cases, the estimated distribution sums to 1 over all finite strings. However, in some pathological cases, probability mass can ``leak'' onto the set of infinite sequences. In order to characterize the notion of leakage more precisely, this paper offers a measure-theoretic treatment of language modeling. We prove that many popular language model families are in fact tight, meaning that they will not leak in this sense. We also generalize characterizations of tightness proposed in previous works.
translated by 谷歌翻译
After just a few hundred training updates, a standard probabilistic model for language generation has likely not yet learnt many semantic or syntactic rules of natural language, which inherently makes it difficult to estimate the right probability distribution over next tokens. Yet around this point, these models have identified a simple, loss-minimising behaviour: to output the unigram distribution of the target training corpus. The use of such a crude heuristic raises the question: Rather than wasting precious compute resources and model capacity for learning this strategy at early training stages, can we initialise our models with this behaviour? Here, we show that we can effectively endow our model with a separate module that reflects unigram frequency statistics as prior knowledge. Standard neural language generation architectures offer a natural opportunity for implementing this idea: by initialising the bias term in a model's final linear layer with the log-unigram distribution. Experiments in neural machine translation demonstrate that this simple technique: (i) improves learning efficiency; (ii) achieves better overall performance; and (iii) appears to disentangle strong frequency effects, encouraging the model to specialise in non-frequency-related aspects of language.
translated by 谷歌翻译
In this work, we investigate the representation capacity of multilayer perceptron networks that use the sine as activation function - sinusoidal neural networks. We show that the layer composition in such networks compacts information. For this, we prove that the composition of sinusoidal layers expands as a sum of sines consisting of a large number of new frequencies given by linear combinations of the weights of the network's first layer. We provide the expression of the corresponding amplitudes in terms of the Bessel functions and give an upper bound for them that can be used to control the resulting approximation.
translated by 谷歌翻译
In this paper, we seek to measure how much information a component in a neural network could extract from the representations fed into it. Our work stands in contrast to prior probing work, most of which investigates how much information a model's representations contain. This shift in perspective leads us to propose a new principle for probing, the architectural bottleneck principle: In order to estimate how much information a given component could extract, a probe should look exactly like the component. Relying on this principle, we estimate how much syntactic information is available to transformers through our attentional probe, a probe that exactly resembles a transformer's self-attention head. Experimentally, we find that, in three models (BERT, ALBERT, and RoBERTa), a sentence's syntax tree is mostly extractable by our probe, suggesting these models have access to syntactic information while composing their contextual representations. Whether this information is actually used by these models, however, remains an open question.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
我们将图形神经网络训练来自小工具N体模拟的光晕目录的神经网络,以执行宇宙学参数的无现场级别可能的推断。目录包含$ \ Lessim $ 5,000 HAROS带质量$ \ gtrsim 10^{10} 〜h^{ - 1} m_ \ odot $,定期卷为$(25〜H^{ - 1} {\ rm mpc}){\ rm mpc}) ^3 $;目录中的每个光环都具有多种特性,例如位置,质量,速度,浓度和最大圆速度。我们的模型构建为置换,翻译和旋转的不变性,不施加最低限度的规模来提取信息,并能够以平均值来推断$ \ omega _ {\ rm m} $和$ \ sigma_8 $的值$ \ sim6 \%$的相对误差分别使用位置加上速度和位置加上质量。更重要的是,我们发现我们的模型非常强大:他们可以推断出使用数千个N-n-Body模拟的Halo目录进行测试时,使用五个不同的N-进行测试时,在使用Halo目录进行测试时,$ \ omega _ {\ rm m} $和$ \ sigma_8 $身体代码:算盘,Cubep $^3 $ M,Enzo,PKDGrav3和Ramses。令人惊讶的是,经过培训的模型推断$ \ omega _ {\ rm m} $在对数千个最先进的骆驼水力动力模拟进行测试时也可以使用,该模拟使用四个不同的代码和子网格物理实现。使用诸如浓度和最大循环速度之类的光环特性允许我们的模型提取更多信息,而牺牲了模型的鲁棒性。这可能会发生,因为不同的N体代码不会在与这些参数相对应的相关尺度上收敛。
translated by 谷歌翻译
酒吧 - 希利尔的结构是正式语言理论的经典结果。它通过构造表明,无上下文语言与普通语言之间的相交本身是无上下文的。但是,其原始配方(Bar-Hillel等人,1961年)都不是其加权扩展(Nederhof和Satta,2003年)都无法使用$ \ epsilon $ -Arcs处理自动机。在此简短的说明中,我们将Bar-Hillel结构概括为即使自动机包含$ \ epsilon $ -Arcs,也可以正确计算交叉路口。我们进一步证明,我们的广义结构导致语法编码输入自动机和语法的结构,同时保留原始结构的渐近尺寸。
translated by 谷歌翻译